Feature Selection by Using Classification and Regression Trees (cart)
نویسنده
چکیده
Hyper-spectral remote sensing increases the volume of information available for research and practice, but brings with it the need for efficient statistical methods in sample spaces of many dimensions. Due to the complexity of problems in high dimensionality, several methods for dimension reduction are suggested in the literature, such as Principal Components Analysis (PCA). Although PCA can be applied to data reduction, its use for classifying images has not produced good results. In the present study, the Classification and Regression Trees technique, more widely known by the acronym CART, is used for feature selection. CART involves the identification and construction of a binary decision tree using a sample of training data for which the correct classification is known. Binary decision trees consist of repeated divisions of a feature space into two sub-spaces, with the terminal nodes associated with the classes. A desirable decision tree is one having a relatively small number of branches, a relatively small number of intermediate nodes from which these branches diverge, and high predictive power, in which entities are correctly classified at the terminal nodes. In the present study, AVIRIS digital images from agricultural fields in the USA are used. The images were automatically classified by a binary decision tree. Based on the results from the digital classification, a table showing highly discriminatory spectral bands for each kind of agricultural field was generated. Moreover, the spectral signatures of the cultures are discussed. The results show that the decision trees employ a strategy in which a complex problem is divided into simpler sub-problems, with the advantage that it becomes possible to follow the classification process through each node of the decision tree. It is emphasized that it is the computer algorithm itself which selects the bands with maximum discriminatory power, thus providing useful information to the researcher.
منابع مشابه
Factors Influencing Drug Injection History among Prisoners: A Comparison between Classification and Regression Trees and Logistic Regression Analysis
Background: Due to the importance of medical studies, researchers of this field should be familiar with various types of statistical analyses to select the most appropriate method based on the characteristics of their data sets. Classification and regression trees (CARTs) can be as complementary to regression models. We compared the performance of a logistic regression model and a CART in predi...
متن کاملEnsemble Classification and Extended Feature Selection for Credit Card Fraud Detection
Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...
متن کاملCART-based feature selection of hyperspectral images for crop cover classification
In this paper, we propose a procedure to reduce data dimensionality while preserving relevant information for posterior crop cover classification. The huge amount of data involved in hyperspectral image processing is one of the main problems in order to apply pattern recognition techniques. We propose a dimensionality reduction strategy that eliminates redundant information and a subsequent sel...
متن کاملApple Stem and Calyx Recognition by Decision Trees
In this paper, a decision tree-based approach for recognizing stem and calyx regions of apples by computer vision is proposed. The method starts with background removal and object segmentation by thresholding. Statistical, textural and shape features are extracted from each segmented object and these features are introduced to two decision tree algorithms: CART and C4.5. Feature selection is ac...
متن کاملMargin Adaptive Risk Bounds for Classification Trees
Margin adaptive risk bounds for Classification and Regression Trees (CART, Breiman et. al. 1984) classifiers are obtained in the binary supervised classification framework. These risk bounds are obtained conditionally on the construction of the maximal deep binary tree and permit to prove that the linear penalty used in the CART pruning algorithm is valid under margin condition. It is also show...
متن کامل